Goto

Collaborating Authors

 software log


SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs

arXiv.org Artificial Intelligence

Software logs are messages recorded during the execution of a software system that provide crucial run-time information about events and activities. Although software logs have a critical role in software maintenance and operation tasks, publicly accessible log datasets remain limited, hindering advance in log analysis research and practices. The presence of sensitive information, particularly Personally Identifiable Information (PII) and quasi-identifiers, introduces serious privacy and re-identification risks, discouraging the publishing and sharing of real-world logs. In practice, log anonymization techniques primarily rely on regular expression patterns, which involve manually crafting rules to identify and replace sensitive information. However, these regex-based approaches suffer from significant limitations, such as extensive manual efforts and poor generalizability across diverse log formats and datasets. To mitigate these limitations, we introduce SDLog, a deep learning-based framework designed to identify sensitive information in software logs. Our results show that SDLog overcomes regex limitations and outperforms the best-performing regex patterns in identifying sensitive information. With only 100 fine-tuning samples from the target dataset, SDLog can correctly identify 99.5% of sensitive attributes and achieves an F1-score of 98.4%. To the best of our knowledge, this is the first deep learning alternative to regex-based methods in software log anonymization.


Convolutional vs Large Language Models for Software Log Classification in Edge-Deployable Cellular Network Testing

arXiv.org Artificial Intelligence

Software logs generated by sophisticated network emulators in the telecommunications industry, such as VIAVI TM500, are extremely complex, often comprising tens of thousands of text lines with minimal resemblance to natural language. Only specialised expert engineers can decipher such logs and troubleshoot defects in test runs. While AI offers a promising solution for automating defect triage, potentially leading to massive revenue savings for companies, state-of-the-art large language models (LLMs) suffer from significant drawbacks in this specialised domain. These include a constrained context window, limited applicability to text beyond natural language, and high inference costs. To address these limitations, we propose a compact convolutional neural network (CNN) architecture that offers a context window spanning up to 200,000 characters and achieves over 96% accuracy (F1>0.9) in classifying multifaceted software logs into various layers in the telecommunications protocol stack. Specifically, the proposed model is capable of identifying defects in test runs and triaging them to the relevant department, formerly a manual engineering process that required expert knowledge. We evaluate several LLMs; LLaMA2-7B, Mixtral 8x7B, Flan-T5, BERT and BigBird, and experimentally demonstrate their shortcomings in our specialized application. Despite being lightweight, our CNN significantly outperforms LLM-based approaches in telecommunications log classification while minimizing the cost of production. Our defect triaging AI model is deployable on edge devices without dedicated hardware and widely applicable across software logs in various industries.


Coralogix raises $10 million to apply AI to software logs

#artificialintelligence

Canvasing software logs is tricky business when you're juggling multiple dev environments. About 50% of logging statements don't include any information about critical things like variable state at the time of an error, according to GitHub and OverOps surveys, which is perhaps why developers spend an estimated one fourth of their time -- more than a full day out of the work week -- on troubleshooting. This unfortunate state of affairs motivated Lior Redlus, Ariel Assaraf, and Guy Kroupp to found Coralogix in 2014. The San Francisco-based startup provides AI-imbued analytics solutions addressing a host of software delivery and maintenance challenges. Its suite automatically clusters log records back to their patterns and identifies connections among those patterns, forming baseline flows for comparison and future study.